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TESTS  POH  A  CHANGE-POINT 


Barry  James,  Kang  Ling  James,  and  David  Siegmnnd 


The  problem  considered  is  that  of  testing  a  sequence  of  independent  normal  random  vari¬ 
ables  with  constant,  known  or  unknown,  variance  for  no  change  in  mean  versus  alternatives 
with  a  single  change-point.  Varioiis  tests,  such  as  those  based  on  the  likelihood  ratio  and  recur¬ 
sive  residuals,  are  studied.  Power  approximations  are  developed  by  integrating  approximations 
for  conditional  boimdary  crossing  probabilities.  A  comparison  of  several  tests  is  made,  and  the 
power  approximations  obtained  are  compared  with  Monte  Carlo  values. 
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TESTS  FOR  A  CHANGE-POINT 


Barry  Jamesy  Kang  Ling  James,  and  David  Siegmnnd 


1.  Introdnetion  and  Snnunary. 


Let  -  •  •  ,Xm  be  independent  random  variables.  The  purpose  of  this  paper  is  to 

discuss  tests  of  the  hypothesis  that  the  x’s  are  identically  distributed  against  the  alternative 
that  for  some  value  j,  1  <  i  <  m,  Zi,  ■  ■  ■ ,  2,  are  identically  distributed  and  Xj^i,  ,Xm^e 
also  identically  distributed,  but  with  a  distribution  different  from  that  of  zi.  For  the  most 
part  we  consider  only  the  special  case  where  the  Xn  are  normally  distributed  with  mean  /i„ 
and  variance  1.  Then  the  hypotheses  can  be  described  more  formally  as 


flb  :  Ml  =  Ml  =  •  •  •  =  /*». 

Hi  :  For  some  y,  1  <  y  <  m,  =  =  •  •  •  = 


(1) 


One  important  goal  in  studying  this  very  simple  problem  in  detail  is  to  gain  insight 
into  more  complicated  models  and  related  problems.  Thus,  although  we  do  not  consider 
regression  models  explicitly,  we  attempt  to  keep  this  generalisation  in  mind  and  comment  on 
it  when  appropriate.  Similarly,  we  do  not  study  the  related  problems  of  estimating  y  and/or 
the  magnitude  of  the  change,  but  when  other  things  are  equal  we  prefer  test  statistics  which 
seem  to  be  useful  for  estimation  as  well.  For  discussions  of  confidence  sets  for  the  change- 
•point  y  see  Cobb  (1978)  and  Siegmund  (1986). 


The  sampling  distributions  of  most  of  the  statistics  described  below  are  quite  compli¬ 
cated,  and  other  authors  have  often  studied  these  problems  by  numerical  or  Monte  Carlo 
methods  (e.g.  Sen  and  Srivastava,  1975;  Hawkins,  1977;  Worsley,  1983).  Using  methods 
developed  to  solve  boundary  crossing  problems  in  sequential  analysis,  we  give  analytic  ap¬ 
proximations  to  the  sampling  distributions  of  various  test  statistics.  This  facilitates  our 


comparisons  of  ciifferent  tests  and  allows  one  to  make  informal  use  of  the  procedures  with* 
out  any  programming  effort.  The  process  of  obtaining  the  approximations  also  yields  some 
qualitative  insights  into  the  tests  themselves. 

The  paper  is  organized  as  follows.  Section  2  introduces  several  test  statistics  and 
describes  their  behavior  qualitatively.  The  most  important  are  (1)  the  likelihood  ratio 
statistic,  (2)  an  ad  hoc  statistic  proposed  by  Pettitt  (1980),  which  can  be  interpreted  as 
a  kind  of  score  statistic,  (3)  the  recursive  residuals  statistic  of  Brown,  Durbin,  and  Evans 
(1975),  and  (4)  the  quasi  Bayes  statistic  of  Chemoff  and  Zacks  (1964).  Approximations 
for  the  significance  levels  of  these  tests  are  given  in  Section  3  and  approximations  for  the 
power  in  Section  4.  Section  5  contains  numerical  examples.  Section  6  contains  miscellaneous 
remarks  and  speculations  concerning  extensions  of  our  results  to  non-normal  data,  survival 
analysis  and  regression  problems.  An  appendix  gives  some  mathematical  results.  A  more 
detailed  presentation  of  the  underlying  mathematics  will  be  given  in  a  future  paper. 

The  reader  more  interested  in  our  conclusions  than  in  the  theoretical  development  may 
wish  to  turn  directly  from  Section  2  to  Sections  5  and  6,  which  can  be  read  independently 
of  Sections  3  and  4. 

2.  Test  Statistics  and  Qualitative  Behavior. 

Let  zi, X3,  •  •  • , Zm  be  independent  random  variables,  and  assume  that  Xn  is  normally 
distributed  with  mean  Hn  and  variance  1  (n  =  l,2,"-,m).  Let  Sn  =  Xi  +  •■•  +  Xn- 
The  square  root  of  the  log  likelihood  ratio  statistic  for  testing  the  hypotheses  (1)  is  easily 
calculated  to  be 

max  -  5kl/lt(l  -  k/m)]‘/*}.  (2) 

l<i<m 

Without  the  max,  (2)  is  the  normalized  difference  between  the  mean  of  the  first  k  obser¬ 
vations  and  the  overall  mean,  i.e.  it  is  the  standard  two  sample  test  statistic  for  testing 


that  the  means  of  the  first  k  observations  and  the  Inst  m  -  it  are  equal.  The  max  searches 
for  the  most  plausible  place  to  separate  the  sample  into  two  subsamples  having  different 
means.  A  derivation  of  this  statistic  which  suggets  some  alternative  possibilities  goes  as 
follows.  The  testing  problem  (1)  is  invariant  under  common  shifts  in  location  of  all  the 
observations,  so  one  might  restrict  consideration  to  invariant  procedure,  i.e.,  those  which 
depend  not  directly  on  the  *’s  but  only  on  the  differences  Vn  =  ~  =  2,  •  •  • ,  m  (cf. 

Lehmann,  1959,  p.  216).  For  given  values  of  j  and  j  =  Mm  -  Mi>  the  likelihood  ratio  of  the 
y’s  under  Hi  to  the  y’s  under  Ho  is  easily  calculated  to  be 

exp{6[jS^/m  -  5,]  -  y(l  -  j/m)S^/2}.  (3) 

Maximizing  (3)  first  over  S  and  then  over  j  yields  (2). 

In  what  follows  it  wll  be  more  convenient  to  consider  one  sided  alternatives,  for  which 
we  assume  that  the  sign  of  6  is  known,  say  S  >  0.  Then  the  likelihood  ratio  statistic  is  (2) 
without  the  absolute  values.  For  reasons  given  below  we  consider  the  generalization 

where  1  <  mo  <  mj  <  m. 

By  differentiating  the  logarithm  of  (3),  setting  ^  =  0  and  then  maximizing  over  j,  we 
obtain  a  score-like  statistic  suggested  by  Pettitt  (1980), 

m«  [kS^/m  -  Sk).  (5) 

l<k<m 

Still  another  possibility  is  to  take  the  log  of  (3)  for  some  arbitrary  value  ^o,  which  might 
be  interpreted  as  a  typical  change  or  the  minimal  change  one  is  interested  in  detecting,  and 
maximize  over  j.  This  yields  the  statistic 


max{k5m/m  -  Sk  -  k{l  -  Jb/m){o/2}. 


(6) 


An  interesting  ad  hoe  teat  statistic  is  the  so-called  recursive  residual  statistic  proposed 
bjr  Brown,  Durbin,  and  Evans  (1975).  We  consider  the  standardized  residual  of  Xn+i  from 
the  mean  of  xi,  •  •  • , x„,  to  wit 

2„  =  {n/(n-l- l))‘/’(a!„+i  -  n  =  1,2, •  •  •  ,m  -  1.  (7) 

We  form  the  cumulative  sum  5„  =  zi  -f - 1-  Zn.  and  we  use  as  a  test  statistic 

max  (S„/n*/*).  (8) 

Actually  there  is  considerable  arbitrariness  in  this  definition.  One  might  equally  well  cu¬ 
mulate  siuns  “from  the  right”  to  obtaiu 

{Sm-i  “  S^-k)/{k  -  1)*^*,  (9) 

and  one  might  consider  either  (7)  or  (8)  with  Zn  defined  as  the  residual  of  Xn  from  the  mean 
of  Xn+i,  ■  •  • ,  Xm-  We  shall  argue  below  that  (9)  is  typically  preferable  to  (8),  but  in  general 
there  seems  to  be  no  preferred  way  to  define  z„. 

It  is  easy  to  see  that  under  Ho  the  recursive  residuals  Zi,  ■  ■  ■  ,Zm-i  are  independent 
standard  normal  random  variables.  The  appeal  of  the  rectirsive  residual  concept  is  that  this 
property  persists  under  general  regression  models  (Brown,  Durbin,  and  Evans,  1975).  By 
way  of  contrast,  in  a  regression  context  the  null  hypothesis  distribution  of  the  likelihood 
ratio  statistic  depends  on  the  spacings  between  the  independent  variables. 

Chemoff  and  Zacks  (1964)  assume  that  j  has  a  uniform  prior  distribution  over  {1, 2,  -  ■  ■ , 
m}  and  that  S  is  close  to  zero.  An  expansion  for  small  6  gives  the  quasi  Bayesian  test  statistic 

C=X;V("+l)r'**n.  (10) 

11=1 

where  Zn  is  the  recursive  residual  defined  in  (7).  Gardner  (1969)  follows  the  Chemoff-Zacks 
prescription  to  obtain  a  test  statistic  for  two  sided  alternatives,  which  turns  out  to  be  quite 


different  from  (10)  in  appearance.  A  disadvantage  of  Gardner’s  statistic  is  that  in  general 
it  gives  no  idea  whether  6  is  positive  or  negative.  For  this  reason  one  might  prefer  to  use 
IGI  for  a  two  sided  test.  For  simplicity  we  consider  only  the  statistic  (10)  for  a  one  sided 
alternative.  Unlike  the  other  statistics  suggested  above,  the  sampling  distribution  of  (10) 
is  normal  under  both  the  null  and  alternative  hypotheses.  Other  things  being  equal,  this 
would  be  a  point  in  favor  of  (10).  We  shall  see  that  other  things  are  not  equal. 

Under  the  alternative  hypothesis,  we  have  the  following  relations: 


EikS^/m-Sk)=\ 

I  (j/»«)(w 

>/{»(»+ 


4(1  —  jlm)S  for  k  <  j 
{jlm){m  -  k)S  for  4  >  j; 


for  k  <  j 

1)}^/*  for  k>  j  , 

for  4  <  y 
;n  +  l)}»/2  for4>y. 


E{c)  =  y(m  -  y)^. 


Some  qualitative  insights  follow  from  (11)-(13).  If  we  represent  the  rejection  region 
of  (2)  (without  the  absolute  value  signs)  as  indicated  in  Figure  1,  it  seems  intuitively  clear 
that  the  primary  contribution  to  the  power  of  that  test  comes  from  the  probability  that 
the  process  45m/m  -  Si,  exceeds  6(4(1  -  k/m)yf^  for  some  4  in  a  neighborhood  of  4  =  j. 
If  we  superimpose  the  rejection  region  of  (5),  i.e.  {maxt(45m/m  —  Sj,)  >  6x},  on  the  same 
picture,  in  order  that  the  two  tests  have  the  same  significance  level,  6i  must  be  less  than 
6(4(1  -  klm)Y^^  in  neighborhood  of  4  s  m/2.  Hence  we  expect  that  (5)  has  greater 
power  than  (2)  when  j  is  ?.bout  m/2  and  the  line  of  drift  is  more  likely  to  carry  the  process 
above  the  constant  bi  than  above  6(4(1  —  4/m)}*^^.  The  converse  is  true  when  j  is  near 
0  or  m,  and  the  curve  6(4(1  -  4/m)}*/*  near  the  change-point  lies  below  6i.  Introduction 


of  mo  and  mi  in  (4)  gives  the  statistician  the  flexibility  to  trade  some  decrease  of  power  to 
detect  changes  occurring  near  j  —  0  and  j  =  m  for  an  increase  in  power  to  detect  changes 
occurring  near  j  =  m/2. 

Similar  reasoning  suggests  that  (6),  like  (4),  is  a  compromise  between  (2)  and  (5);  and 
preliminary  calculations  suggest  this  is  indeed  the  case.  Since  (6)  seems  less  easily  adapted 
to  multiparameter  problems,  we  shall  not  discuss  it  in  this  paper. 

One  can  make  a  similar  crude  comparison  of  (8)  and  (9).  Now,  in  effect  the  rejection 
regions  are  the  same,  but  the  processes  are  different  (cf.  Figure  2).  It  is  easy  to  see  that  if 
j  ~  mt*  for  some  fixed  0  <  t*  <  1  and  m  is  large,  then 

pr(5„_i  -  5ml*  >  62{m(l  -  =  maxpr(5m-i  -  S^-i-n  > 

n 

>  max  pr(5„  > 

fl 

provided  {t*/(l  -  t*)}^^*log(l/t*)  >  2e”^.  To  a  considerable  extent  the  power  of  (8)  and 
(9)  is  determined  by  the  maximum  marginal  probabilities,  and  hence  it  appears  that  (9)  is 
usually  preferable  to  (8).  Although  some  additional  investigation  of  (8)  may  be  warranted, 
we  do  no  piu^ue  it  here. 

Remark.  Although  Brown,  Durbin,  and  Evans  (1975)  and  Sen  in  several  papers  (e.g.  Sen, 
1982)  consider  recursive  residuab  as  in  (8),  often  normalized  by  c  +  cn  rather  than 
Cox  in  the  discussion  to  Brown,  Durbin,  and  Evans  (1975)  implicitly  proposes  (9). 

It  b  worth  noting  that  (11)  and  (4)  or  (5)  suggest  simple  estimates  of  j  and  S.  For 
example,  (4)  suggests  estimating  j  by  the  value  j  which  yields  the  maximum,  and  then  (11) 
suggests  estimating  8  by 

(jSmlm  -  S;)/1}(1  -  j'/m)]. 

Similarly,  a  comparison  of  5m-i  -  5m-»,  k=  1, 2,  •  •  • ,  m,  with  the  corresponding  expected 
values  (cf.  Figure  2)  gives  some  idea  of  the  values  of  j  and  6,  albeit  less  well  defined  than 


FIGURE  1 


FIGURE  2 


for  (4)  or  (5).  Although  one  can  compute  numerically  a  formal  Bayes  estimate,  there  do 
not  appear  to  be  natural  estimators  associated  with  (10). 


For  futrire  reference  we  also  record  the  log  likelihood  ratio  statistic  for  testing  (1)  when 
the  variance  of  the  z’s  is  an  unknown,  but  unchanging,  constant  a^.  The  statistic  is 

(5fc  -  4S„/m)2 


max 

l<jE<in 


4(1  -  4/m)  E™!®*  ~ 


]}• 


(15) 


where  =  8^1^- 

Approximations  to  the  signj?»^ance  level  of  (4),  (5),  (9),  and  (15)  are  given  in  the 
following  section. 


3.  Appraximationa  to  Significance  Levels. 


In  this  section  we  give  approximations  to  the  right  hand  tail  of  the  distributions  under 
ffo  of  (4),  (5),  and  (9),  or  equivalently,  (8).  These  approximations  have  been  developed  in 
the  context  of  sequential  analysis.  For  example,  the  null  hypothesis  distribution  of  (9)  yields 
the  significance  level  of  a  so-called  repeated  significance  test,  first  studied  by  Armitage, 
McPherson,  and  Rowe  (1969)  by  numerical  methods.  For  derivation  of  the  approximations 
and  documentation  of  their  accuracy  for  very  small  samples,  e.g.  m  =  5,  see  Siegmund 
(1985,  1986). 

We  also  give  approximations  for  the  significance  level  of  (15)  and  appropriate  mod¬ 
ifications  of  (5)  and  (9)  for  the  case  of  w  unknown  variance.  The  derivation  of  these 
approximations  requires  some  new  techniques,  which  are  described  in  the  simplest  context 
in  an  appendix  and  will  be  given  in  more  detail  in  a  future  publication. 

Several  authors  have  noted  that  the  likelihood  ratio  statistic  in  variotis  change-point 
problems,  not  restricted  to  the  normal  case  considered  here,  has  a  large  sample  approxi¬ 
mation  imder  the  null  hypothesis  of  no  change,  which  corresponds  to  (4)  with  4  = 


0,  replaced  by  standard  Brownian  motion  0  <  t  <  m}.  See,  for  exam¬ 

ple,  Kendall  and  Kendall  (1980)  and  Matthews,  Farewell,  and  Fyke  (1985).  Although  this 
approximation  is  often  fairly  crude,  its  generality  makes  it  useful.  A  simple  and  accurate 
approximation  to  the  Brownian  motion  probability  is  given  below.  With  a  view  towards 
more  general  problems,  it  is  given  for  the  d-dimensional  case  (d  >  1). 

In  order  to  state  approximations  to  the  significance  levels  of  (4),  (5),  and  (9)  it  is 
helpful  to  introduce  the  function 

z/(a)  =  2a:"^exp  I  (z  >  0),  (16) 

where  4  denotes  the  standard  normal  distribution  function.  The  function  v  is  easily  eval¬ 
uated  numerically;  or  alternatively  in  the  range  0  <  i  <  2  one  can  use  the  local  expansion 

t/{x)  ^exp{-px)  +  o{x^)  (*  — 0),  (17) 


where  p  is  a  numerical  constant  which  approximately  equals  .583.  See  Siegmund  (1985, 
Chapter  X). 


be<:  Vi>  y2i  "  '  <  I/m  be  independent  standard  normal  random  variables  and  Sn 
■  +  Vn  {n  =  1,2,  -  •  • ,  m).  Then  for  1  <  mo  <  <  m  and  6  >  0 

prf  m«  l(n5m/m  -  S„)/{n(l  -  n/m)}^/*]  >  A 
3  1  -  $(5) -h  6^(6)  /  x~^v{x  +  /mz)dx, 


and 


pr(  max  >  fr)  —  1  - 

mo<n<ni 


^(6)  +  bip[b)  [  x~^i^{x)dx, 


Vl  + 


(18) 


(19) 


where  $  is  the  standard  normal  distribution,  (p  =  and  v  is  given  by  ( 16)  or  approximately 
by  (17).  Also 


pr{  max  (nSm/ri  -  S„)  >  i)  S  exp[-2m~‘(4  +  p)^ 

l<n<m 


(20) 


9 


where  p  .583,  as  above. 


The  approximatioxis  (18)-(20)  are  respectively  (11.33),  (4.40),  and  (10.43)  of  Siegmund 
(1985).  They  can  be  shown  numerically  to  provide  excellent  approximations  by  comparing 
them  to  exact  numerical  computations  of  Pocock  (1977)  and  Worsley  (1983).  Some  com¬ 
parisons  are  given  by  Siegmund  (1985,  1986).  Siegmund  (1985,  p.  83)  provides  a  table  for 
evaluating  the  integral  in  (19). 


In  the  case  of  an  unknown  and  constant  variance,  if  the  statistics  (4),  (5),  and  (9)  are 
Studentized  in  the  “obvious”  way,  we  obtain  the  following  apptcximations  as  analogues  of 
(18)-(20)  respectively.  Let  7  =  and  assume  that  0  <  7  <  1.  Then 


pr  {nSm/rn  -  S„)/{n{l  -  nfm)y^^  |m"*  ^(y„  -  |  j  -  *  j 

a  (m/2x)^/2  j\l  - 

-H  (2x)-‘/26(1  -  6Vm)<"*-'‘)/’  /  x-^u[z  "  7®)*}]^® 

(21) 

and 


pr 


>  ij  3  (m/2sr)»/*  ^\l  - 
/■Vlmoll-I*))*/* 

+  (2t)-^/*6(1  -  6Vm)<’"-’)/2  /  z-*i/(i)d*. 


(22) 


-,*)}•/* 

The  first  integrals  in  (21)  and  (22)  can  themselves  be  approximated  by  virtue  of  the  expan¬ 
sion 


m 


f\l  -  z2)("*-V2rf*  =  6-»(i  _  6Vm)<”’->/»{l  -I-  2m-*  -  6"*  +  o(m-*},  (23) 

Jt 

which  is  valid  as  b,  m  -*  00  with  =  7  fixed.  Now  let  7  =  6/m  and  assume  that 


0  <  7  <  1/2.  Then 


pr 


-  Sk)/  ^(y?  -  y*)’  |  > 

S  i/{47/(1  -  47*)‘'^}(1  -  46V»«*)‘"*"’’^*- 


(24) 


The  approximations  (21),  (22),  and  (24)  are  written  in  a  way  to  facilitate  comparison 
with  the  corresponding  results,  (18)-(20),  for  known  variance.  It  appean  that  there  is  little 
difference  between  the  two  cases.  Some  numerical  results  given  in  the  appendix  indicate 
that  this  is  in  fact  the  case  except  when  the  probability  or  the  sample  size  m  is  quite  small. 
The  appendix  also  contains  an  informal  proof  of  (24).  The  more  complicated  (21)  and  (22) 
will  be  discussed  in  a  future  paper. 


Remarks,  (i)  It  is  easy  to  see  that  the  probability  that  the  likelihood  ratio  statistic  (15) 
exceeds  a  is  given  approximately  by  twice  (21)  with  b  =  (m{l  -  exp(-2o/m)}]^/*.  (ii)  At 
first  glance  (22)  may  appear  to  be  an  incorrect  Studentization  of  (9).  But  note  that  the  y’s 
in  (22)  play  the  role  of  the  s’s  in  (9);  and  if  Zn  is  defined  by  (7),  it  is  easy  to  use  the  Helmert 
orthogonal  transformation  to  show  that  “*»»)*•  (“0  approximation 

(22)  can  be  used  for  the  general  regression  model  of  Brown,  Durbin,  and  Evans  (1975),  but 
the  distribution  of  the  likelihood  ratio  statistic  is  quite  model  dependent. 


An  easy  application  of  the  theory  of  weak  convergence  of  stochastic  processes,  e.g. 
Billingsley  (1968),  shows  that  the  probabilities  discussed  above  are  given  approximately 
by  the  corresponding  probabilities  defined  in  terms  of  a  Brownian  motion  process  1V{t), 
0  <  t  <  00.  For  example,  the  left  hand  side  of  (18)  or  (21)  is  approximately 

pr(  Wo{t)/[t(l  -  t)}*/*  >  6],  (25) 

where  lFb(0  =  W{t)-tW{l)  is  a  Brownian  bridge  process  on  (0, 1]  and  U  =  m,7m  (t  =  0, 1). 
The  advantage  of  (25)  as  an  approximation  is  its  generality.  It  would  serve  also  if  the 
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underlying  distribution  of  the  observations  were  not  normal,  hence  for  testing  the  hypothesis 
of  no  change  in  quite  general  models.  Thb  same  generality  is  its  disadvantage,  because  it 
means  that  the  approximation  is  often  a  crude  one. 


Several  authors,  e.g.  Mandl  (19C2),  Keilson  and  Ross  (1975),  and  DeLong  (1981)  have 
described  numerical  methods  for  evaluating  (25)  and  have  published  numerical  tables.  We 
give  here  an  approximation  to  (25),  which  is  easily  evaluated  and  which  is  valid  in  an 
arbitrary  niimber  of  dimensions.  A  proof  can  be  given  along  the  lines  of  Siegmund’s  (1985, 
Theorem  11.1)  argument  for  the  one  dimensional  case. 

Let  Wo(t),  0  <  t  <  1,  be  a  d-dimensional  Brownian  bridge  process,  and  let  ||  ■  ||  denote 
the  d-dimensional  Euclidean  norm.  Let  0  <  to  <  ti  <  1  and  set  r  =  ti(l  —  to)/to(l  —  h). 
Then  as  6  — »  oo 

where  ?(■)  is  the  gamma  function. 

Remark.  It  is  well  known  and  easily  verified  that  W(t)  =  (1  -1-  t)Wo{t/(l  +  t)},  0  < 
t  <  oo,  is  a  standard  d-dimensional  Brownian  motion  process,  and  hence  (26)  also  gives 
approximations  to  the  probabilities  appearing  in  (19)  and  (21). 

The  accuracy  of  (26)  is  easily  ascertained  by  comparing  it  with  extensive  tables  of 
DeLong  (1981)  for  d  =  1,2, 3, 4.  For  example,  for  d  =  4,  r  =  50,  and  b  =  3.85,  4.10,  and  4.58, 
DeLong  gives  for  the  probability  in  (16)  the  respective  values  .1,  .05,  .01.  The  right  hand 
side  of  (26)  yields  .104,  .051,  and  .0103.  In  fact,  the  approximation  (26)  is  moderately  good 
even  when  the  probability  is  not  close  to  sero,  dthough  there  is  no  apparent  mathematical 
reason  why  this  should  be  the  case. 


As  an  approximation  to  the  probability  in  (18),  (26),  or  more  precisely  (26)  multiplied 


by  1/2,  18  much  less  satisfactoiy.  The  numenca!  example  discussed  ^extensively  in  Section 
5  has  6  s  2.82,  mo  =:  5,  mi  =  35,  and  m  =  40,  so  r  =  49.  The  approximation  (18)  yields 
.025,  whereas  (26)  ^ives  .041.  A  2500  repetition  Monte  Carlo  experiment  using  importance 
sampling  along  the  lines  indicated  in  Remark  4.45  of  Siegmund  (1985)  yielded  .0239  ±  .0005. 

Remark.  Kiefer  (1959)  has  computed  pr{maxo<t<i  ||R^o(OII  ^  exactly  in  terms  of  an 
infinite  series  of  Bessel  functions,  and  has  given  tables  for  dimensions  2,  3,  and  4.  An 
approximation  similar  to  (26)  but  requiring  a  somewhat  different  argument  is  given  in 
Problem  11.1  of  Siegmund  (1985). 


4.  Power. 


In  this  section  we  adapt  the  methods  of  Siegmund  (1977,  1978,  1985)  to  obtain  ap¬ 
proximations  to  the  power  of  (4),  (5),  and  (9).  The  basic  ideas,  which  go  back  to  Anscombe 
(1952),  are  much  simpler  than  in  the  preceding  section  and  quite  general.  They  are  sketched 
below  without  details.  We  first  consider  (4)  and  (5),  and  later  indicate  the  changes  appro¬ 
priate  to  handle  (9). 

The  following  result  is  related  to  Cramer’s  approximation  for  the  probability  of  ruin 
of  a  risk  process.  See  Feller  (1972,  Chapter  XII)  and  Siegmund  (1985,  Chapter  VIII). 

Proposition  1.  Let  p  >  0  and  assume  that  yi,y2,-"  are  independent  N(-n,l).  Then  as 


*  — ►  00 


M  >  *  for  some  n  >  1^  ~  »'(2p)  exp(-2pa:). 


where  v  is  defined  by  (16)  and  given  approximately  in  (17). 


For  the  rest  of  this  section  *1,  •  •  • ,  are  independent  ^(pi,  1),  zy+i,  •  •  • ,  are  inde¬ 
pendent  N{nm,  1),  =  Pm  -  Pi,  5fi  =  *1  •  •  +  *n,  and  5*  =  nSm/m  -  5„,  n  =  0, 1,  ■  •  • ,  m. 


The  process  5*,  n  =  0, 1,  •  ■  ■ ,  m  has  the  mean  value  (11)  and  the  covariance  function 


cov(Si,5:)  =  4(l-nM  {k<n),  (27) 

of  a  discrete  time  Brownian  bridge,  tied  down  to  equal  0  at  n  =  0  and  n  =  m.  Let  e(t),  0  < 
t  <  1,  be  a  function,  and  for  1  <  mo  <  <  m  let  To  =  inf{n  :  n  >  mo,  5*  >  mc(n/m)}. 

The  power  of  the  tests  defined  by  (4)  and  (5)  is  of  the  form 

pr(ro  <  mi) 

with  c(i)  =  6m“‘/^{t(l  -  4)}^^^  and  e{t)  =  bm~^  respectively.  Assume  that  mo  <  j  <  mi; 
the  other  cases  can  be  handled  similarly.  We  begin  with  the  obvious  decomposition 

pr(To  <  mi  I  Sj  =  f)  pr(5y  €  d^).  (23) 

Since  the  marginal  distribution  of  is  known,  to  approximate  (28)  it  suffices  to  approxi¬ 
mate  the  conditional  probability.  Moreover,  given  Sj  =  the  processes  5*,  r*  =  0, 1,  •  •  • ,  j, 
and  S*,  n  =  j,  7  -f  1,  *  ‘  ,  m  are  conditionally  independent  and  are  themselves  discrete  time 
Brownian  bridges  with  endpoints  tied  down  at  0  and  at  Hence  in  terms  of  Ti  =  sup{n  : 
n  <  mi,  5^  >  mc(r»/m)},  we  can  write  for  f  <  tne{jfm) 

pr(ro  <  mi  I  5/  =  0  =  pr(ro  <  j  \  5/  =  ^  +  pr(ri  >  j  \  5/  =  f ) 

(29) 

-  pr(ro  <  3 1  s;  =  f )  pr(ri  >  j  1 5;  =  i). 

Since  both  probabilities  on  the  right  hand  side  of  (29)  are  of  the  same  form,  it  suffices  to 
consider  the  first  one.  We  assume  that  m  is  large  and  that  3  and  7— mo  are  proportional  to  m. 
For  many  boundary  curves  e(t),  including  those  of  interest  here,  the  principal  contribution 
to  the  integral  on  the  right  hand  side  of  (28)  comes  from  values  of  i  close  to  me{3'fm),  say 


^  =  meUlm)  -  x 


(30) 


with  X  —  O(logm)  as  m  — ►  oo.  Given  Sy  =  f  of  the  form  (30),  if  S*  >  me{nlm)  for  some 
mo  <  n  <  j,  this  event  with  overwhelming  probability  occurs  for  some  n  close  to  j.  For  n 
close  to  3,  say  n  =  j  -  A,  we  have 


me{nlm)  =  me{jlm)  -  ke'(jlm)  +  0{k^lm). 

Hence  for  ^  of  the  form  (30)  and  km  =  o{m}f^) 

pt(To  <  j  I  s;  =  ^)  as  pr{s;_*  -  s;  >  I  -  key/m)  for  some  A  <  1  5/  =  0- 

For  k  «  3,  given  Sy  =  f ,  the  process  5y_^  -  Sj,  A  =  1, 2, •  •  •  behaves  like  a  sum  of  in¬ 
dependent  normally  distributed  random  variables,  each  having  mean  -f/j  —  -mc(y/m)/y 
and  variance  1.  Consequently  pr(To  <  y  |  Sy  =  ()  is  approximately  a  probability  of  the 
form  considered  in  Propositon  1  with  n  —  (y/m)"‘c(y/m)  -  e'(3'/m).  Although  we  have 
reasoned  that  the  important  values  of  x  we  not  Iwge,  if  we  nevertheless  use  the  large  x 
approximation  of  Proposition  1  together  with  (17),  we  obtain 

pr(ro  <  y  1  s;  =  O  ^  exp(-2{t-‘c(f)  -  c'(f)}(x  +  p)],  (31) 

where  t*  =  j/m. 

Consider  the  special  case  c(t)  =  6m~*/*{t(l  — t)}^/*,  and  assume  that  =  7  and 

3m~^  —  t*  we  fixed  as  m  — »  00.  If  we  use  (31)  and  a  similw  approximation  for  the  other 
conditional  probability  on  the  right  hand  side  of  (29),  substitute  into  (28),  and  evaluate  the 
integral  asymptotically  as  m  — »  00,  we  obtain  the  approximation 
prf^n  ^  ~  "/m)}*^*  for  some  mo  <  «  <  mj]  SJ 1  -  ♦(f)  +  m“*^*^(f) 

\  <{r(l  -  1 +  <{<•(! -!•)}■/’  )’ 

where  f  =  m'/^jq  -  ^{t*(l  -  t*)}‘^*],  7  =  and  t*  =  j/m. 
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The  analogous  approximation  when  c(t)  =  hjm  is  easily  obtained,  but  in  this  case  it  is 
possible  to  squeeze  out  a  bit  more  accuracy  for  small  samples  by  using  a  slightly  different 
approximation  along  the  lines  suggested  by  Siegmtind  (1985,  Elxample  8.77)  in  a  similar 


context.  The  final  approximation  is  omitted. 

According  to  (13)  the  mean  of  the  numerator  in  (9)  is  a  nonlinear  function  of  k  and 
hence  it  is  convenient  to  center  the  process  to  have  mean  0.  If  we  also  approximate  (13)  by 
jS  log'*'(/fe/y),  the  power  of  (9)  can  be  expressed  as 

pr{5n  >  me{nfm)  for  some  nto  <n<m}.  (33) 

Here  Sn  =  zi-\ - h  Sm  where  the  z’s  are  independent  standard  normal  random  variables, 

and  in  terms  of  t*  =  j/m  and  ^ 

e{t)  »  +  j(*max{log(l  -  t),  logt*}. 

By  conditioning  on  S^-j  one  can  argue  as  above  to  derive  an  approximation  to  (33). 
The  approximate  drift  of  the  conditional  random  walk  Sm-j-k-Sm-f  given  S^-j  =  inc(l- 
t*)  —  I  is  -c(l  -  t*)/(l  —  t*),  and  hence  the  appropriate  p  for  an  application  of  Proposition 
1  is  c(l  —  t*)/(l  —  t*)  —  cL(l  - 1*),  where  cL  denotes  the  left  hand  derivative  of  c.  However, 
since  -  Sm-j,  k  —  l,2,---,y  is  not  tied  down  at  ifc  =  j,  its  drift  is  0  and  the 

approprite  value  of  p  for  this  part  of  the  path  is  e!^(l  -  t*).  The  resulting  approximation  is 

1  -  ♦(( )  +  m-‘/V(?) 

exp(-p[Tf/(l  -  +  2f{t*  logtyi  -  t*)  -h  1}])  exp{-/>7/(l  -  t*)} 

«{2(l-f)  +  flogf}  «flog(l/f)  (34) 

exp(-2^[7/(l  -  t*)^/»  +  logf7(l  -  f)  +  1}]) 

7(1  -  t*)»/3  +  ^{2(1  -  t«)  +  f  logf} 

where  j  =  +  5t*  logt*/(l  -  and  as  always  p  is  the  constant  appearing  in 


Remark.  The  methods  of  this  section  easily  yield  approximations  to  the  power  of  (5). 
However,  for  (8)  they  appear  to  work  only  when  f*  >  e~*.  Otherwise  the  asymptotic 
normalization  of  Daniels  (1974)  may  be  more  useful.  See  also  Barbour  (1981). 

S.  Numerical  Compariaona. 

Tables  1-3  below  compare  the  power  of  the  statistics  (4),  (5),  (9),  and  (10).  To  keep 
the  tables  digestible,  only  the  case  of  a  sample  size  m  =  40  and  one-sided  significance  level 
.025  is  considered.  Two  issues  are  involved:  (i)  the  accuracy  of  the  approximations  given 
in  Section  4  and  (ii)  the  comparative  power  of  the  various  test  statistics.  To  verify  that  the 
approximations  are  sufficiently  accurate  to  give  a  reasonable  picture  of  the  relative  merits 
of  the  various  tests,  the  outcome  of  a  9999  repetition  Monte  Carlo  experiment  is  given  in 
parentheses  in  most  of  the  cells.  Other  numerical  calculations,  not  reported  here,  show  that 
the  essentia]  conclusions  are  unchanged  over  a  range  of  significance  levels  and  sample  sizes, 
although  the  magnitude  of  the  differences  can  be  more  or  less  for  different  sample  sizes. 

Table  1  involves  the  likelihood  ratio  statistic  (5)  with  two  different  choices  of  mo  and 
m^.  Table  2  studies  the  recursive  residual  statistic  (9).  Since  this  statistic  is  not  symmetric 
with  respect  to  the  ordering  of  the  time  scale,  i.e.  a  change  at  j  is  not  equivalent  to  a  change 
at  m  —  j,  this  table  is  slightly  more  elaborate  than  the  others.  Table  3  contain.?  Pettitt’s 
statistic  (5)  and  the  Chemoff-Zacks  quasi  Bayesian  statistic  (10). 

Roughly  speaking,  the  two  likelihood  ratio  statistics  and  the  recursive  residual  statistic 
perform  about  the  same,  while  the  Pettitt  and  Chemoff-Zacks  statistics  have  somewhat 
greater  power  to  detect  changes  occurring  near  j  =  m/2  and  less  power  to  detect  changes 
occurring  near  j  =  0  or  j  =  m.  Some  cf  these  differences  were  predicted  from  qualitative 
considerations  in  Section  2,  and  what  the  numerical  calculations  add  is  a  feeling  for  the 
magnitude  of  the  differences.  The  modified  likelihood  ratio  statistic  with  mo  >  1  and 
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mi  <  m  —  I  has  power  at  j  s  m/2  which  improves  over  the  unmodified  likelihood  ratio 
statistic.  It  must  pay  for  this  improvement  by  having  less  power  for  j  close  to  0  or  m, 
although  in  the  range  of  j  studied,  the  cost  is  not  apparent. 

One  possible  conclusion  is  that  one  should  choose  a  test  statistic  on  a  subjective  basis, 
depending  on  where  one  “expects”  a  change  to  take  place,  should  there  be  one.  A  difficulty 
with  this  recommendation  is  our  belief  that  change-point  statistics  are  often  applied  to 
retrospective  data,  frequently  after  something  resembling  a  change  has  been  noticed  in 
informal  investigations.  If  so,  it  would  not  be  appropriate  to  make  such  a  subjective  choice 
of  test  statistic. 

The  numerical  results  lend  support  to  the  argument  that  the  recursive  residual  statis¬ 
tic  is  not  demonstrably  inferior  to  the  others,  and  since  it  generalizes  immediately  to  a 
regression  context,  it  seems  a  reasonably  good  general  purpose  statistic.  The  arbitrariness 
in  its  definition  noted  in  Section  2  and  reflected  in  the  lack  of  symmetry  about  j  s  in/2  in 
Table  2,  and  the  difficulty  in  using  it  for  estimation  are  weak  points. 

The  Chemoff-Zacks  statistic  does  not  seem  to  have  any  distinct  advantage  over  the 
Pettitt  statistic,  except  the  simplicity  of  its  sampling  distribution,  and  even  that  vanishes 
if  the  variance  is  unknown.  Since  the  Pettit  statistic  gives  simple  and  natural  estimates  of 
j  and  6,  it  seems  preferable. 

If  estimation  of  j  and/or  5  is  an  important  consideration,  the  preferred  statistics  appear 
to  be  the  modified  likelihood  ratio  statistic  and  Pettitt’s  statistic,  which  are  not  inferior  as 
test  statistics  and  provide  natural  estimates.  Although  neither  of  these  tests  dominates  the 
other,  the  modified  likelihood  ratio  test  is  perhaps  slightly  preferred  because  it  performs 
better  when  j  is  near  0  or  m,  where  all  tests  are  weak. 

Some  speculations  about  the  use  of  these  test  statistics  in  different  contexts  are  given 


in  the  following  section. 


Table  1 

Likelihood  Ratio  Statistics 


6  =  2.95,  mo  =  1,  mi  =  39  (4):  b  =  2.82,  mo  =  5,  mi  =  35 


0254  .0250 

.482 

(.499) 

.540 

(.541) 

.353 

(.377) 

.409 

(.406) 

.186 

(.209) 

.182 

(.218) 

.706 

(.716) 

.753 

(.758) 

.549 

(.568) 

.605 

(.608) 

.301 

(.319) 

.300 

(.337) 

.872 

(.878) 

.900 

(.905) 

.737 

(.751) 

.781 

(.785) 

.444 

(.470) 

.447 

(.475) 

Recursive  Residual  Statistic 


(9):  b  =  2.65,  mo  =  5 


20 

.555 

(.573) 

10 

30 

.364 

(.379) 

.462 

(.477) 

5 

35 

.172 

(.165) 

.237 

(.270) 

20 

.758 

(.770) 

10 

30 

.534 

(.558) 

.654 

(.660) 

5 

35 

.244 

(.258) 

.368 

(.398) 

20 

.899 

(.904) 

10 

30 

.702 

(.728) 

.815 

(.821) 

5 

35 

.334 

(.367) 

.520 

(.547) 

Pettit  and  Chemoff-Zacks  Statistics 


s 

3 

Power 

(5):  6  =  8.01  (10):  6  =  240 

0 

“oo” 

.0250 

.0250 

.8 

20 

.633 

.591 

10 

.401 

.376 

S 

.137 

.158 

1.0 

20 

.826 

.782 

10 

.593 

.538 

5 

.205 

.223 

1.2 

20 

.939 

.908 

10 

.769 

.693 

5 

.297 

.301 

Remark.  It  is  natural  to  complement  these  numerical  examples  with  a  discussion  of  asymp* 
totic  efficiency,  but  we  do  not  know  a  satisfactory  formulation.  Although  it  is  a  simple 
matter  to  compute  the  Bahadur  efficiency  of  the  tests,  this  measure  seems  too  crude  to 
provide  much  insight  for  reasonable  sample  sizes.  For  example,  it  would  say  tht  the  like¬ 
lihood  ratio  statistic  (2)  is  better  than  the  others  for  every  6  and  change-points  j  which 
are  proportional  to  m  as  m  — »  oo.  For  a  disctission  of  Bahadur  efficiency  in  a  regression 
context,  see  Deshayes  and  Picard  (1982). 

6.  Generaliiations  and  Variations. 

There  are  two  generalizations  of  the  simple  model  of  the  preceding  sectiotis  which  have 
received  considerable  attention  in  the  literaure.  One  involves  relaxation  of  the  normality 
assumption  to  allow  any  distributlL'n,  say  in  an  exponential  family.  Change-point  problems 
with  Poisson  and/or  Bernoulli  data  are  discussed  by  Kendall  and  Kendall  (1980)  and  Levin 
and  Kline  (1985).  A  more  subtle  variation  occurs  in  Matthews  and  Farewell  (1982),  who 


introduce  a  model  for  survival  after  therapy.  In  their  model  the  effect  of  therapy  is  not 
instantaneoxis,  but  deaths  continue  to  occur  at  &  coxistant  hazard  rate  imtil  an  unknown 
time  when  the  therapy  becomes  effective  and  the  hazard  rate  for  those  still  alive  decreases 
to  a  second  constant  value. 

Kendall  and  Kendall  (1980)  discuss  a  modification  of  the  likelihood  ratio  statistic  and 
observe  that  one  can  approximate  its  null  hypothesis  distribution  by  that  of  the  appropriate 
functional  of  a  Brownian  bridge  process.  On  the  basis  of  what  appear  to  be  limited  compar¬ 
isons  of  simulations  of  the  Poisson  process  with  numerical  computations  for  the  Brownian 
bridge  given  by  Mandl  (1962),  they  conclude  that  the  approximation  is  reasonable. 

The  asymptotic  formula  (26)  gives  an  accurate  and  easily  evaluated  approximation 
to  the  relevant  Brownian  bridge  probability  in  an  arbitrary  munber  of  dimensions,  but 
in  general  one  should  expect  that  it  is  a  rather  crude  approximation  for  the  probability 
of  interest.  In  particular,  for  the  one  dimensional  problem  with  normal  data  considered 
throughout  this  paper,  it  often  overestimates  the  correct  probability  by  40  to  100%.  It 
should  be  possible  to  give  a  precise  asymptotic  approximation  similar  to  (18)  for  the  Poisson 
case;  but  it  remains  to  be  seen  whether  more  sophisticated  mathematical  analysis  actually 
leads  to  a  better  approximation. 

Note  that  the  recursive  residual  idea  depends  very  heavily  on  the  assumption  of  nor¬ 
mality  to  be  exactly  valid,  although  Sen  (1982)  has  shown  that  it  is  asymptotically  valid 
under  quite  general  conditions.  We  do  not  know  of  any  attempt  to  study  the  accuracy  of 
Sen’s  approximations  for  sample  sizes  of  interest. 

Regression  models  are  a  second  generalization  which  has  been  disc\issed  in  a  number 
of  papers,  particularly  with  reference  to  econometric  data.  See,  for  example,  Quandt  (1958, 
1960),  Brown,  Durbin,  and  Evans  (1975),  and  Worsley  (1983).  The  recursive  residual  con- 


cept  adapts  nicely  to  this  setting  (Brown,  et  al.,  1975),  and  indeed  the  null  hypothesis 
distribution  is  basically  no  different  than  in  the  simple  normal  case.  A  precise  asymptotic 
analysis  of  likelihood  ratio  like  statistics  is  rather  complicated  and  depends  on  the  spacings 
between  the  dependent  variables,  which  translate  into  the  variances  of  the  normal  obser¬ 
vations  making  up  a  multidimensional  statistic  similar  to  (4).  Again  the  multidimensional 
Brownian  bridge  provides  a  crude  approximation,  which  is  probably  adequate  for  practical 
pvurposes,  although  rather  unsatisfying  theoretically. 

Remark.  In  a  regression  context,  there  is  some  ambiguity  in  the  definition  of  a  change-point 
problem.  Scientifically,  it  may  well  be  reasonable  to  assume  that  the  regression  function 
is  continuous  at  the  change-point.  We  are  assuming,  however,  along  with  most  authors, 
that  the  regression  function  can  jump.  It  seems  plausible  that  for  testing  the  hypothesis  of 
no  change,  not  much  power  is  lost  by  allowing  this  additional  degree  of  freedom  under  the 
alternative,  but  the  situation  is  less  clear  with  regard  to  estimation.  See  Hinkley  (1969)  or 
Fedcr  (1975a,b)  for  a  discussion  when  the  regression  function  is  reqtiired  to  be  continuous 
at  the  change-point. 


Appendix 

Approodmate  Significance  Levels  for  Stndentized  Processes 

Id  this  appendix  we  describe  an  approach  to  deriving  the  approximations  (21),  (22), 
and  (24).  A  complete  development  is  qtiite  complicated  and  will  be  given  elsewhere.  Here  we 
restrict  attention  to  the  simplest  case,  to  wit  (24),  or  more  precisely  to  a  slight  generalization 
of  (24)  which  involves  one  important  ingredient  of  (21)  and  (22)  as  well.  The  method  is 
adapted  from  Siegmnnd  (1982,  1985). 

Let  •  •  •  be  independent  N(n,a^)  variables,  and  put  5„  =  +  •  •  •  +  y„,  t7„  = 

yj  +  •  •  •  +  y* .  It  is  convenient  to  introduce  the  notation 

pffiA)  =  pr(A  (  ^  =  A)  <  x), 

which  by  sufficiency  does  not  depend  on  the  parameter  Let  Aq  =  A/m  and  consider 

pW(m«_S,>uf),  (X.1) 

Let  5*  =  m~‘  ~  VmY-  Since  the  distribution  of  the  process  {(5n  -  nSmlm)/S, 

n  =  0,  l,---,m}  does  not  depend  on  {fi,(T^),  the  process  is  independent  of  the  complete 
sufficient  satistic  by  Basu’s  theorem  (Lehmann,  1959,  Theorem  5.2).  Therefore 

the  left  hand  side  of  (24)  is  equal  to  (A.l)  for  all  A  >  0. 

Let  r/  be  a  real  number  and  define 

r  =  inf{n  :  >  ft  +  rjn}.  (-^-S) 

Then  in  the  special  case  f?  =  0,  (A.l)  equals  <  m}.  We  study  this  probability  more 

generally  in  the  following  theorem. 

Theorem.  Assume  ft  =  mf,  ^  =  m^o»  ^  ~  for  some  <  f +  »7  aud  Aq  >  (q.  Then 


Fff{r  <  m 


_ ?(?£±±-(o) _ 1/ 

^  L{Ao-4»,?-(2f-eom2j  I 


Aq  -  4n(  -  (2f  -  eo)^  \ 


Ao  -  (S 


,  (A.3) 


where  1/  is  defined  in  (16)  and  given  approximately  in  (17). 


Proof.  Let  —  f^"*)(n,  S„,U„-,  fi,  A,  Ai)  denote  the  likelihood  ratio  of  yi,  •  •  • ,  under 
relative  to  A  straightforward  calculation  shows  that  for  n  <  m  -  2, 

w-i  f  1'"-”-”'/’ ,■  A. -{?M\ 

For  n  =  m  —  1  the  two  probabilities  are  not  absolutely  continuous,  but  we  do  not  need  to 
consider  this  case.  By  Wald’s  likelihood  ratio  identity  (e.g.  Siegmund,  1985,  p.  13),  for  all 
m'  <  m  —  1 

<».■}  =  £«, {4-);,  <m-).  (X.5) 

It  i>  tuj  to  toe  thot  for  any  filed  i  =  1, 2,  •  •  ■  S  »  +  i|(in  -i)>  it  of  tmoller  order 

of  magnitude  than  (A.3),  so  it  suffices  to  show  that  for  suitable  fj,  Ai,  and  m'  =  m  -  »  the 
right  hand  side  of  (A.5)  is  asymptotic  to  (A.3). 

Let  =  m{2(f  +  ri)  -  fo},  A  =  4nii  +  rt-  (oh  and  Aj  =  m(Ao  +  A).  The  choice  of 
(i  has  a  natural  interpretation  in  terms  of  the  reflection  principle  (cf.  Siegmund,  1985,  p. 
39  ff.),  out  our  justification  for  the  choice  of  Aj  is  only  that  it  “works.”  Note  that  A  =  0 
if  »>  =  0,  and  this  case  would  be  adequate  to  prove  (24).  But  dealing  with  arbitrary  q  is  a 
useful  warm-up  for  the  proof  of  (21)  and  (22). 

Some  algebraic  manipulation  shows  that 

{Ar  -Ur-  {(i  -  5y)V(m  -  r)}/{A  -Ur-((-  5y)V(m  -  r)}  (A.6) 

=  1  +  4(f  -t-  »7  -  fo)(5r  -  mf  -  rir)/\{m  -  r){Ao  -  U,/m  -((-  5r)’/m(m  -  r)}]. 


Law  of  large  numbers  arguments  indicate  that  under 

m~‘r-^f/(2f  + 17  -  fo),  m~‘lf,4f{Ao  +  4»7(f  + »?  -  fo)}/(2f  +  17  -  fo), 
and  {A.7) 

a  ~  Sr)^fm{in  -  r)-^(f  +  17  -  (o)(2f  -  fo)/(2?  +  »7  -  fo)- 

The  results  in  (A.7)  show  that  the  right  hand  side  of  (A.6)  is  l  +  Op(m~*).  Hence  by  writing 
(•)  =  exp{log(-)}i  using  (A.6)  and  a  Taylor  series  expansion,  we  see  that  the  first  factor  on 
the  right  hand  side  of  (A.4)  has  the  same  limit  as 

exp[-2(f  +  f7  -  (o)(Sf  -  mf  -  ni‘)/{Xo  -  Vr/m  -  -  Sr)^/m(m  -  r)}], 

which  by  (A.7)  has  the  same  limit  as 

exp(-2(2f  +  f7  -  (o)(Sr  -  mf  -  f7r)/{Ao  -  4r7?  -  (2f  -  fo)*}]-  (^-8) 

Assuming  that  we  can  take  these  limits  inside  the  expectation,  we  see  from  (A.4),  (A.5), 
and  (A.8)  that 

limPffir  <  m  -  3}I(Ao  -  €o’)/{Ao  -  4»7f  -  (2f  - 

^  lim^J”J^{exp(-2(2f  +  17  -  (o)Jlr/{^o  -  47?  “  (2?  -  fo)’}l:  r  <  m  -  3}, 

where  Hr  =  Sr  — m^- tit  a  the  excess  over  the  stopping  boundary.  If  we  were  dealing  with  an 
unconditional  probability  making  the  y’s  independent  and  identically  distributed  with  the 
same  mean  and  variance  as  under  the  renewal  theorem  would  allow  us  to  finish  the 

proof  along  established  lines  (e.g.  Siegmxmd,  1985,  Chapter  VIII).  Over  the  relatively  short 
interval  in  which  m~^r  falls  with  -probability  close  to  one  (cf.  (A.7)),  the  conditional 
and  xmconditional  processes  behave  essentially  the  same,  leading  one  to  expect  the  same 
limiting  result  for  the  -distribution  of  Rf.  This  anticipated  res\ilt  can  be  proved  by 
ad  hoc  methods  or  by  appealing  to  a  general  theorem  in  Inchi  Hu’s  unpublished  Stanford 


thesis,  thus  completing  our  informal  proof  of  the  theorem. 

The  fundamental  identity  (A.5)  can  be  used  to  provide  an  effective  variance  reducing 
device  if  one  wants  to  perform  a  Monte  Carlo  experiment  to  check  the  accuracy  of  the 
approximation  (A.3).  However,  there  is  an  interesting  pitfall,  which  is  not  present  in  the 
use  of  this  technique  as  suggested  by  Siegmund  (1975,  1985,  Remark  4.45).  In  particular, 
it  is  apparent  from  (A.4)  and  (A.6)  that  when  r  =  m  —  2,  can  be  extremely  large. 
Although  this  happens  with  small  probability,  it  can  create  an  overflow;  and  even  if  the 
likelihood  ratio  is  truncated  it  can  completely  distort  the  Monte  Carlo  estimator.  To  avoid 
the  problem,  it  evidently  suffices  to  use  (A.5)  with  m'  =  m  -  Z  rather  than  m  -  2,  even 
though  this  slightly  increases  the  bias  of  the  resulting  estimator. 

Table  4  gives  some  numerical  results.  The  first  approximation  in  each  row  is  (24), 
and  for  comparison  the  approximation  (20)  for  the  case  of  known  variance  is  also  given. 
The  Monte  Carlo  estimates  are  based  on  the  'dentity  (A.5)  with  m*  =  m  -  3  in  the  fifth 
column,  upper  entry,  and  m'  =  m  -  2  in  the  fifth  column,  lower  entry.  They  result  from 
a  2500  repetition  experiment,  and  are  given  plus  or  minus  an  estimated  standard  error. 
The  likelihood  ratio  was  truncated  to  prevent  overflow.  In  some  cases  the  lower  entry  is 
somewhat  larger  and  its  estimated  standard  error  much  larger  than  the  upper  entry.  This 
phenomenon  is  undoubtedly  a  result  of  the  instability  of  the  Monte  Carlo  procedure,  as 
described  ^bove.  In  these  cases  a  Monte  Carlo  estimate  based  on  a  direct  frequency  count 
in  a  9999  repetition  experiment  is  given  in  the  sixth  column.  In  all  cases  it  suggests  that 
the  use  of  (A.5)  with  m'  =  m  -  3  is  preferred  over  m  -  2.  When  one  takes  the  substantially 
smaller  standard  error  into  account,  use  of  (A.5)  is  roughly  twenty-five  to  one  hundred  times 
as  efficient  as  direct  Monte  Carlo. 

The  analytic  approximation  is  reasonably  good  in  all  cases,  although  it  deteriorates 
somewhat  at  the  smaller  sample  sizes.  It  is  interesting  that  the  known  sigma  approximation. 


(20),  is  sometimes  larger  and  sometimes  smaller  than  (24).  It  is  also  reasonably  2u:curate, 
but  since  neither  approximation  is  onerous  to  compute,  one  may  as  well  use  the  theoretically 
appropriate  one. 

Table  4 

Approximations  to  <  m} 

Analytic  Monte  Carlo 


m 

h 

(24) 

(20) 

(A.5) 

Direct 

40 

8.01 

.0237 

.0250 

.0236  ±  .0002 

.0234  ±  .0002 

20 

6.0 

.0094 

.0131 

.0097  ±  .0002 

.0098  ±  .0002 

20 

5.0 

.0442 

.0043 

.0449  ±  .0006 

.0493  ±  .0018 

.0444 

20 

4.0 

.1366 

.1224 

.1414  ±  .0014 

.1810  ±  .0091 

.1368 

15 

5.0 

.0104 

.0157 

.0114  ±  .0002 

.0121  ±  .0004 

.0116 

15 

4.5 

.0287 

.0319 

.0329  ±  .0005 

.0482  ±  .0035 

.0288 
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change  in  mean  versus  alternatives  with  a  single  change-point.  Various 
tests,  such  as  those  based  on  the  likelihood  ratio  and  recursive  residuals, 
are  studied.  Power  approximations  are  developed  by  integrating  approxima¬ 
tions  for  conditional  boundary  crossing  probabilities.  A  comparison  of 
several  tests  is  made,  and  the  power  approximations  obtained  are  compared 
with  Monte  Carlo  values. 
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